Add plaintext client as alternative to SSH. #198

putnopvut · 2020-05-28T20:33:35Z

This offers an alternative to the SSH client currently used during scale
testing.

This instead connects to a server using plaintext and sends commands
this way. It is assumed that a server is running on the server on port 8000
Because ncat is a good example of such a server, the client is called
NCatClient in this change.

The reason for this new method is to eliminate the overhead of
connecting over SSH from the measurements in tests. For reasonably
large-scale tests (~300 nodes) it results in a bout a 7% reduction in
execution time.

One notable difference between this new method and using SSH is that
connections are created per host, rather than per sandbox. This allows
for clients that connect to sandboxes hosted on the same machine to
share the same connection. This has only been tested with tests that
execute serially. Tests that run scenarios in parallel may encounter
issues.

If there is no server listening, and a connection cannot be created, we
fall back to using SSH.

This offers an alternative to the SSH client currently used during scale testing. This instead connects to a server using plaintext and sends commands this way. It is assumed that a server is running on the server on port 8000 Because ncat is a good example of such a server, the client is called NCatClient in this change. The reason for this new method is to eliminate the overhead of connecting over SSH from the measurements in tests. For reasonably large-scale tests (~300 nodes) it results in a bout a 7% reduction in execution time. One notable difference between this new method and using SSH is that connections are created per host, rather than per sandbox. This allows for clients that connect to sandboxes hosted on the same machine to share the same connection. This has only been tested with tests that execute serially. Tests that run scenarios in parallel may encounter issues. If there is no server listening, and a connection cannot be created, we fall back to using SSH.

hzhou8

The commit message mentioned: "One notable difference between this new method and using SSH is that connections are created per host, rather than per sandbox. ..." For my understanding the SSH connections are per host (farm node), too, not per sandbox. cc @l8huang

Otherwise LGTM. (Please rebase)

l8huang · 2020-05-31T03:00:08Z

rally_ovs/plugins/ovs/utils.py

+    try:
+        global NCAT_CLIENT_CACHE
+        server = cred["host"]
+        return NCAT_CLIENT_CACHE.setdefault(server, NCatClient(server))


Could you please add a option for selecting client? It's better to keep ssh client as default instead of a fallback. There could be some security concerns about using plain text client in corp network.

Absolutely, I can do that.

l8huang · 2020-05-31T03:06:11Z

For my understanding the SSH connections are per host (farm node), too, not per sandbox.

@hzhou8 correct, SSH connections are per farm node.

@putnopvut Thanks for this performance improvement. Could you please add some documents about server-side settings for using this NCatClient?

putnopvut · 2020-06-01T11:50:00Z

Sure, in the next revision, I'll add documentation about the server side.

My understanding was that there could be multiple farm nodes per physical server. For instance, you might have 10 farm nodes implemented as containers on a single physical server. In other words, the credential['host'] and credential['password'] of all 10 farm nodes would be the same values. Each of these farm nodes would then use a separate SSH connection, even though they're all connecting to the same physical server. With the plaintext client, I've set it up so that those 10 farm nodes will share the same connection instead. This reduces the number of connections required during the test since all farm nodes can share the same connection.

However, this has the drawback that if you are running parallel scenarios, this will cause test failures. Scenarios that are run in parallel can potentially receive messages intended for the other running scenarios.

hzhou8 · 2020-06-03T02:19:40Z

rally_ovs/plugins/ovs/utils.py

+            raise NCatError(details)
+
+    def close(self):
+        # Test scenarios call close after every operation because with SSH,


For my understanding this is not true for SSH. We don't close connections for every operation. SSH connection is cached for every farm node, which models a physical server. For each new operation, it doesn't recreate new SSH connection, but just reusing existing connection to the node. Connections are closed when the test is done and the test program exits.

OK, I have an explanation for this :)

This patch was originally made against https://github.com/dceara/ovn-scale-test/tree/ovn-switch-per-node-devel . When I discussed with Dumitru about pushing this change, we agreed to open a PR here first and then we could port it to that particular branch after.

On that branch, we have added close() calls in a lot of places. We ran into issues where tests would fail after a while. We could no longer open new SSH connections because the servers would reject incoming connections due to having too many open. We added close() calls after basically every SSH command. My comment here makes sense in the context of how that branch operates, but here it's just wrong. I'll correct it.

By the way, if you're wondering why we are running into this problem of too many SSH connections, it likely has to do with our different understanding of farm nodes. As I said in another comment, we configure our tests to have multiple farm nodes per physical machine. So the number of SSH connections can pile up more quickly than if each farm node maps to a single physical machine.

@putnopvut sorry for the confusion with the branches in my fork. Branch ovn-switch-per-node-devel is stale. What we use downstream is ovn-switch-per-node and in that branch I removed all the close() calls because we addressed the issues we had, i.e., use runner type serial instead of a single runner that would connect to all farm nodes. Like this only the connections used in one of the iteration of the runner will be open (and will get cleaned up when the iteration ends).

I'll try to open PRs for all the stuff that's only in my fork as soon as I get the chance. I hope that will clear things out.

Thanks for the explain @putnopvut and @dceara

hzhou8 · 2020-06-03T02:26:51Z

My understanding was that there could be multiple farm nodes per physical server. For instance, you might have 10 farm nodes implemented as containers on a single physical server. In other words, the credential['host'] and credential['password'] of all 10 farm nodes would be the same values. Each of these farm nodes would then use a separate SSH connection, even though they're all connecting to the same physical server. With the plaintext client, I've set it up so that those 10 farm nodes will share the same connection instead. This reduces the number of connections required during the test since all farm nodes can share the same connection.

Sorry that we didn't thought about farm node that way. A farm node models a physical server, and each physical server runs N sandboxes or containers that simulate hypervisors. So in what use case would need multiple farm nodes on a same physical server?

On the other hand, assume there is an use case, but how does ovn-scale-test tell which farm nodes are on same physical server so that they can share same connection? Can SSH use the same mechanism to share connections between multiple farm nodes on same physical server? Why would plaintext connection be different from SSH on that front?

putnopvut · 2020-06-03T12:28:49Z

My understanding was that there could be multiple farm nodes per physical server. For instance, you might have 10 farm nodes implemented as containers on a single physical server. In other words, the credential['host'] and credential['password'] of all 10 farm nodes would be the same values. Each of these farm nodes would then use a separate SSH connection, even though they're all connecting to the same physical server. With the plaintext client, I've set it up so that those 10 farm nodes will share the same connection instead. This reduces the number of connections required during the test since all farm nodes can share the same connection.

Sorry that we didn't thought about farm node that way. A farm node models a physical server, and each physical server runs N sandboxes or containers that simulate hypervisors. So in what use case would need multiple farm nodes on a same physical server?

In our case, we are using ovn-fake-multinode to set up containers representing the farm nodes. In our rally deployment, we specify "install_method": "physical" for our farm nodes, rather than "sandbox" or "docker", along with a "host_container" setting. To see an example of what this does, have a look at rally_ovs/plugins/ovs/ovs_clientsimpl.py::_OvnNbctl.run() . It prepends whatever command we are to run with sudo docker exec {host_container}.

On the other hand, assume there is an use case, but how does ovn-scale-test tell which farm nodes are on same physical server so that they can share same connection? Can SSH use the same mechanism to share connections between multiple farm nodes on same physical server? Why would plaintext connection be different from SSH on that front?

The ovn_multihost context gathers data from the deployment (see get_ovn_multihost_info), including ssh credentials. By referencing this context data, we can tell if multiple farm nodes have the same ssh host.

I think SSH could use this same mechanism if desired. My mindset with creating this patch was to focus on the plaintext client, not to add features to the SSH client. However, I can see how this could be frustrating to have two clients that operate differently for seemingly no reason. For my next iteration on this pull request, I'll make the plaintext client work the way SSH does now. I can make a later change to alter both SSH and plaintext to share connections between farm nodes if possible.

hzhou8 · 2020-06-04T00:16:49Z

My understanding was that there could be multiple farm nodes per physical server. For instance, you might have 10 farm nodes implemented as containers on a single physical server. In other words, the credential['host'] and credential['password'] of all 10 farm nodes would be the same values. Each of these farm nodes would then use a separate SSH connection, even though they're all connecting to the same physical server. With the plaintext client, I've set it up so that those 10 farm nodes will share the same connection instead. This reduces the number of connections required during the test since all farm nodes can share the same connection.

Sorry that we didn't thought about farm node that way. A farm node models a physical server, and each physical server runs N sandboxes or containers that simulate hypervisors. So in what use case would need multiple farm nodes on a same physical server?

In our case, we are using ovn-fake-multinode to set up containers representing the farm nodes. In our rally deployment, we specify "install_method": "physical" for our farm nodes, rather than "sandbox" or "docker", along with a "host_container" setting. To see an example of what this does, have a look at rally_ovs/plugins/ovs/ovs_clientsimpl.py::_OvnNbctl.run() . It prepends whatever command we are to run with sudo docker exec {host_container}.

On the other hand, assume there is an use case, but how does ovn-scale-test tell which farm nodes are on same physical server so that they can share same connection? Can SSH use the same mechanism to share connections between multiple farm nodes on same physical server? Why would plaintext connection be different from SSH on that front?

The ovn_multihost context gathers data from the deployment (see get_ovn_multihost_info), including ssh credentials. By referencing this context data, we can tell if multiple farm nodes have the same ssh host.

I think SSH could use this same mechanism if desired. My mindset with creating this patch was to focus on the plaintext client, not to add features to the SSH client. However, I can see how this could be frustrating to have two clients that operate differently for seemingly no reason. For my next iteration on this pull request, I'll make the plaintext client work the way SSH does now. I can make a later change to alter both SSH and plaintext to share connections between farm nodes if possible.

Make sense! Thanks for the explanation.

putnopvut · 2020-06-08T12:53:50Z

Sorry for the slow turnaround on this, but essentially I've been down a rabbit hole on this. When it was suggested to add configuration for plaintext vs. ssh, it got me to look into how the best way to configure it was. It looks like the deployment is the proper place to do it. And from there, I found out that I had missed some key uses of SSH in the code, and that SSH use is baked in more heavily than I had previously thought.

hzhou8 requested changes May 29, 2020

View reviewed changes

l8huang reviewed May 31, 2020

View reviewed changes

hzhou8 reviewed Jun 3, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add plaintext client as alternative to SSH. #198

Add plaintext client as alternative to SSH. #198

putnopvut commented May 28, 2020

hzhou8 left a comment

l8huang May 31, 2020

putnopvut Jun 1, 2020

l8huang commented May 31, 2020

putnopvut commented Jun 1, 2020

hzhou8 Jun 3, 2020

putnopvut Jun 3, 2020

dceara Jun 3, 2020 •

edited

Loading

hzhou8 Jun 4, 2020

hzhou8 commented Jun 3, 2020

putnopvut commented Jun 3, 2020 •

edited

Loading

hzhou8 commented Jun 4, 2020

putnopvut commented Jun 8, 2020

Add plaintext client as alternative to SSH. #198

Are you sure you want to change the base?

Add plaintext client as alternative to SSH. #198

Conversation

putnopvut commented May 28, 2020

hzhou8 left a comment

Choose a reason for hiding this comment

l8huang May 31, 2020

Choose a reason for hiding this comment

putnopvut Jun 1, 2020

Choose a reason for hiding this comment

l8huang commented May 31, 2020

putnopvut commented Jun 1, 2020

hzhou8 Jun 3, 2020

Choose a reason for hiding this comment

putnopvut Jun 3, 2020

Choose a reason for hiding this comment

dceara Jun 3, 2020 • edited Loading

Choose a reason for hiding this comment

hzhou8 Jun 4, 2020

Choose a reason for hiding this comment

hzhou8 commented Jun 3, 2020

putnopvut commented Jun 3, 2020 • edited Loading

hzhou8 commented Jun 4, 2020

putnopvut commented Jun 8, 2020

dceara Jun 3, 2020 •

edited

Loading

putnopvut commented Jun 3, 2020 •

edited

Loading